A Guide to DataFrames - Maple Help
For the best experience, we recommend viewing online help using Google Chrome or Microsoft Edge.

Online Help

All Products    Maple    MapleSim

A Guide to Data Frames

A DataFrame is one of the basic data structures in Maple. Data frames are a list of variables, known as DataSeries, which are displayed in a rectangular grid. Every column (variable) in a DataFrame has the same length, however, each variable can have a different type, such as integer, float, string, name, truefalse, etc., which makes data frames an ideal storage device for heterogeneous data.

When printed, Data frames resemble matrices in that they are viewed as a rectangular grid, but a key difference is that the first row corresponds to the column (variable) names, and the first column corresponds to the row (individual) names. These row and columns are treated as header meta-information and are not a part of the data. Moreover, the data stored in a DataFrame can be accessed using these header names, as well as by the standard numbered index.

The following example page shows a number of common tasks when working with data frames.


Creating a new DataFrame

Describing a DataFrame

Indexing entries in data frames

Changing values in a DataFrame

Details on DataFrame structure

How to change column names

Adding a new column or row

Removing a column or row

Selecting or Removing values that match a criteria

Dealing with missing values

Removing duplicates

Reordering columns

Reordering rows

Sorting A DataFrame

Filtering data in a DataFrame

What does with do?

Reshaping data frames

Applying a function to columns of a DataFrame

Changing the datatype of a column

Converting a DataFrame to other data types

Creating an empty DataFrame

Importing data into a DataFrame

Plotting data from a DataFrame

More examples

Creating a new DataFrame

Data frames are rich containers for information that can store multiple types of data. For example, the following vectors contain information on three berries: Raspberries, Grapes and Strawberries respectively.




Type: string

genus := < "Rubus", "Vitis", "Fragaria" >:

Nutrition: Energy per 100g in kJ


Type: integer

energy := < 220, 288, 136 >:

Nutrition: Carbohydrates per 100g in g


Type: float

carbohydrates := < 11.94, 18.1, 7.68 >:

Tons produced worldwide in 2011


Type: integer

total_tons := < 543421, 58500118, 4594539 >:

Top producing country in 2011


Type: name

top_producer := < Russia, China, USA >:

A DataFrame can combine all of these different types of data into one structure.

DF := DataFrame( < energy | carbohydrates | total_tons | top_producer | genus > );



It can be observed that in the above DataFrame, the row and column names are equivalent to the index value. The row and column names can be specified using the rows and columns options respectively. It is also useful to specify the types of data in each DataSeries using the datatypes option.

DF := DataFrame( < energy | carbohydrates | total_tons | top_producer | genus >,
            columns = [ Energy, Carbohydrates, `Total Tons`, `Top Producer`, Genus ],
            rows = [ Raspberry, Grape, Strawberry ],
            datatypes = [ integer, float, integer, anything, string ] );

DFEnergyCarbohydratesTotal TonsTop ProducerGenusRaspberry22011.9400000000000543421RussiaRubusGrape28818.100000000000058500118ChinaVitisStrawberry1367.680000000000004594539USAFragaria


Note: It is important to remember that any vectors included in the DataFrame should have the same length. If the vectors are of different length, an error is returned.

Describing a DataFrame

The Describe command prints a description of the structure of the DataFrame including the number of observations ( rows ), the number of variables ( columns ), as well as the type of each column (if specified). In addition, for numeric columns, the minimum and maximum values are displayed, and for truefalse, string or name columns, the distinct levels are given.

Describe( DF );

DF :: DataFrame: 3 observations for 5 variables
Energy:         Type: integer   Min: 136.000000  Max: 288.000000
Carbohydrates:  Type: float[8]  Min: 7.680000  Max: 18.100000
Total Tons:     Type: integer   Min: 543421.000000  Max: 58500118.000000
Top Producer:   Type: anything  Tally: [China = 1, USA = 1, Russia = 1]
Genus:          Type: string    Tally: ["Fragaria" = 1, "Rubus" = 1, "Vitis" = 1]

Indexing entries in data frames

Unlike matrices, data frames can be indexed using the column or row names. For example, to view the energy in kJ for raspberries:







Note that above, the index follows the convention [ row, column ] and a single value is retrieved. If a DataFrame is indexed by a single argument, this indexes by [ column ] and the respective column is returned. For example, to retrieve values for all berries for the Energy variable:







To return an entire row, for example, the entire row for Raspberries as a DataSeries:


Energy220Carbohydrates11.9400000000000Total Tons543421Top ProducerRussiaGenusRubus



Energy220Carbohydrates11.9400000000000Total Tons543421Top ProducerRussiaGenusRubus


For more examples, see Indexing a DataFrame or the Subsets of DataFrames example worksheet.

Changing values in a DataFrame

Changing values in an existing DataFrame can be done in a similar way to other data structures in Maple; simply assign to the index position that you want to change. For example, to change the value for Carbohydrates for Grape:

DF[Grape, Carbohydrates] := 20:


EnergyCarbohydratesTotal TonsTop ProducerGenusRaspberry22011.9400000000000543421RussiaRubusGrape28820.58500118ChinaVitisStrawberry1367.680000000000004594539USAFragaria


Before moving on, we can set this back to its original value:

DF[2,2] := 18.1:


EnergyCarbohydratesTotal TonsTop ProducerGenusRaspberry22011.9400000000000543421RussiaRubusGrape28818.100000000000058500118ChinaVitisStrawberry1367.680000000000004594539USAFragaria


Details on DataFrame structure

There are many useful commands for querying information about the structure of data frames. For example, numelems returned the total number of elements in the DataFrame:

numelems( DF );



The upperbound command returns the dimensions of the DataFrame.

upperbound( DF );



To get the number of rows in the DataFrame, put 1 as the second argument to upperbound:

upperbound( DF, 1 );



To get the number of columns in the DataFrame, put 2 as the second argument to upperbound:

upperbound( DF, 2 );



The Datatypes command returns a list of the data type of each of the columns of data:

Datatypes( DF );



How to change column names

Data frames can have both row and column names. In the above example, these were added to the DataFrame using the columns and rows options. To retrieve the current list of column names, the ColumnLabels command can be used:

ColumnLabels( DF );

Energy&comma;Carbohydrates&comma;Total Tons&comma;Top Producer&comma;Genus


The SubsColumnLabel command returns a DataFrame with a modified column label:

DF := SubsColumnLabel( DF, 4, Country );

DFEnergyCarbohydratesTotal TonsCountryGenusRaspberry22011.9400000000000543421RussiaRubusGrape28818.100000000000058500118ChinaVitisStrawberry1367.680000000000004594539USAFragaria


Adding a new column or row

The Append command appends another column or row to an existing DataFrame. The arguments for the Append command specify the DataFrame to append to, the data to append, and optionally, the orientation and the name for the new data (which can be a DataFrame or DataSeries). The following adds a new column to the DataFrame corresponding to truefalse values for if the given berry is a botanical berry or not:

DF := Append( DF, < false, true, false >, label = `Botanical Berry` );

DFEnergyCarbohydratesTotal TonsCountryGenusBotanical BerryRaspberry22011.9400000000000543421RussiaRubusfalseGrape28818.100000000000058500118ChinaVitistrueStrawberry1367.680000000000004594539USAFragariafalse


If we have a new row of information on blackberries such as:

NewRow := DataSeries( < 180 | 9.61 | Mexico | "Rubus" | false >,
            labels = [ Energy, Carbohydrates, Country, Genus, `Botanical Berry` ] );

NewRowEnergy180Carbohydrates9.61CountryMexicoGenusRubusBotanical Berryfalse


This row can be added to the existing DataFrame using the Append command.

DF := Append( DF, NewRow, mode = row, label = Blackberry );

DFEnergyCarbohydratesTotal TonsCountryGenusBotanical BerryRaspberry22011.9400000000000543421RussiaRubusfalseGrape28818.100000000000058500118ChinaVitistrueStrawberry1367.680000000000004594539USAFragariafalseBlackberry1809.61undefinedMexicoRubusfalse


Note: The value of the Total tons produced in 2011 is unknown for this DataSeries. When this is appended to the original DataFrame, since there is no value for Total Tons, the value undefined is added where the data is unknown. For more on missing values, see the Dealing with missing values section.

Removing a column or row

The Remove command removes a selected column or row from the DataFrame. This command is very useful when doing further analysis using commands that require the arguments to be purely numeric, such as visualizations in Statistics, etc.

For example, to remove the non-numeric Genus, Country and Botanical Berry columns:

Remove( DF, [ Genus, Country, `Botanical Berry` ] );

EnergyCarbohydratesTotal TonsRaspberry22011.9400000000000543421Grape28818.100000000000058500118Strawberry1367.680000000000004594539Blackberry1809.61undefined


To remove a row, use the mode option:

Remove( DF, Grape, mode = row );

EnergyCarbohydratesTotal TonsCountryGenusBotanical BerryRaspberry22011.9400000000000543421RussiaRubusfalseStrawberry1367.680000000000004594539USAFragariafalseBlackberry1809.61undefinedMexicoRubusfalse


Selecting or Removing values that match a criteria

The remove, select, and selectremove commands are useful when removing (or selecting) rows that contain values that match a given criteria.

For example, the selectremove command returns two results - one that matches the given criteria and one that does not. With the given criteria that the rows contain even values for Total Tons:

selectremove( x->type(x,even), DF, `Total Tons`);

EnergyCarbohydratesTotal TonsCountryGenusBotanical BerryGrape28818.100000000000058500118ChinaVitistrue,EnergyCarbohydratesTotal TonsCountryGenusBotanical BerryRaspberry22011.9400000000000543421RussiaRubusfalseStrawberry1367.680000000000004594539USAFragariafalseBlackberry1809.61undefinedMexicoRubusfalse


Dealing with missing values

Some datasets may have missing values. In the DataFrame for blackberries, it can be observed that the value for Total Tons is undefined:


EnergyCarbohydratesTotal TonsCountryGenusBotanical BerryRaspberry22011.9400000000000543421RussiaRubusfalseGrape28818.100000000000058500118ChinaVitistrueStrawberry1367.680000000000004594539USAFragariafalseBlackberry1809.61undefinedMexicoRubusfalse


There are several commands for dealing with missing values in data frames including DropMissing and FillMissing.

FillMissing can be used to fill missing values with another value. For example, here we can fill the undefined value in the DataFrame with 0:

FillMissing( DF, 0 );

EnergyCarbohydratesTotal TonsCountryGenusBotanical BerryRaspberry22011.9400000000000543421RussiaRubusfalseGrape28818.100000000000058500118ChinaVitistrueStrawberry1367.680000000000004594539USAFragariafalseBlackberry1809.610MexicoRubusfalse


Note: In order to permanently change the value for any missing values, the result of the FillMissing command must be assigned to the DataFrame.

Removing duplicates

Duplicate rows (or duplicate entries in a given column) do occur in many datasets. Say one was to mistakenly add another row to our DataFrame that contains the same information about blackberries, but the row is added under a misspelled label:

DF := Append( DF, DataFrame( << 180 | 9.61 | Mexico | "Rubus" | false >>,
           columns = [ Energy, Carbohydrates, Country, Genus, `Botanical Berry` ],
           rows = [ Blackbery ] ) );

DFEnergyCarbohydratesTotal TonsCountryGenusBotanical BerryRaspberry22011.9400000000000543421RussiaRubusfalseGrape28818.100000000000058500118ChinaVitistrueStrawberry1367.680000000000004594539USAFragariafalseBlackberry1809.61undefinedMexicoRubusfalseBlackbery1809.61undefinedMexicoRubusfalse


By inspecting the DataFrame above, it is easy to see the duplicate entry, but duplicates may not be easy to find by inspection when dealing with larger DataFrames.

The AreDuplicate command returns a DataSeries of truefalse values that indicate if a row has a duplicate or not.

AreDuplicate( DF );



This result can be used to index a DataFrame to show just the duplicate rows:

DF[ AreDuplicate( DF ) ];

EnergyCarbohydratesTotal TonsCountryGenusBotanical BerryBlackberry1809.61undefinedMexicoRubusfalseBlackbery1809.61undefinedMexicoRubusfalse


The AreUnique command can be used to show the unique rows in a DataFrame. If a DataFrame is indexed by the results, the duplicate rows are removed. By default, the AreUnique command marks the first instance of a row as unique and any subsequent instance of a matching row as duplicate.

DF[ AreUnique( DF ) ];

EnergyCarbohydratesTotal TonsCountryGenusBotanical BerryRaspberry22011.9400000000000543421RussiaRubusfalseGrape28818.100000000000058500118ChinaVitistrueStrawberry1367.680000000000004594539USAFragariafalseBlackberry1809.61undefinedMexicoRubusfalse


The keep option controls if the first, last, or none of the duplicates is marked as unique.

DF[ AreUnique( DF, keep = last ) ];

EnergyCarbohydratesTotal TonsCountryGenusBotanical BerryRaspberry22011.9400000000000543421RussiaRubusfalseGrape28818.100000000000058500118ChinaVitistrueStrawberry1367.680000000000004594539USAFragariafalseBlackbery1809.61undefinedMexicoRubusfalse


DF := DF[ AreUnique( DF, keep = first ) ];

DFEnergyCarbohydratesTotal TonsCountryGenusBotanical BerryRaspberry22011.9400000000000543421RussiaRubusfalseGrape28818.100000000000058500118ChinaVitistrueStrawberry1367.680000000000004594539USAFragariafalseBlackberry1809.61undefinedMexicoRubusfalse


Reordering columns

In the How to change column names section, the ColumnLabels command was used to return the list of variable names. The ColumnLabels command also returns the current order of the columns:

ColumnLabels( DF );

Energy&comma;Carbohydrates&comma;Total Tons&comma;Country&comma;Genus&comma;Botanical Berry


To reorder columns in a DataFrame, use the DataFrame index notation to specify the new order for the columns and reassign this to the existing DataFrame variable.

DF := DF[ [ Genus, Carbohydrates, Energy, Country, `Total Tons`, `Botanical Berry` ] ];

DFGenusCarbohydratesEnergyCountryTotal TonsBotanical BerryRaspberryRubus11.9400000000000220Russia543421falseGrapeVitis18.1000000000000288China58500118trueStrawberryFragaria7.68000000000000136USA4594539falseBlackberryRubus9.61180Mexicoundefinedfalse


Note: It is not recommended that DataFrames with strictly numeric indexed column labels are reordered. This may lead to a conflict between the column label and the index position.

Reordering rows

In the Reordering columns section, the ColumnLabels command was used to return the current order of the columns. The RowLabels command returns the current order of the rows:

RowLabels( DF );



To reorder rows in a DataFrame, use the DataFrame index notation to specify the new order for the rows and reassign this to the existing DataFrame variable.

DF := DF[ [ Blackberry, Grape, Raspberry, Strawberry ], .. ];

DFGenusCarbohydratesEnergyCountryTotal TonsBotanical BerryBlackberryRubus9.61180MexicoundefinedfalseGrapeVitis18.1000000000000288China58500118trueRaspberryRubus11.9400000000000220Russia543421falseStrawberryFragaria7.68000000000000136USA4594539false


Note: It is not recommended that DataFrames with strictly numeric indexed row labels are reordered. This may lead to a conflict between the row label and the index position.

Sorting A DataFrame

One of the most common tasks when working with data is to order the data by ascending or descending numeric value, alphabetical order, or using some other ordering. The sort command orders a DataFrame by values in a selected column (DataSeries).

For example, to sort the DataFrame in order of ascending carbohydrate value:

sort( DF, Carbohydrates );

GenusCarbohydratesEnergyCountryTotal TonsBotanical BerryStrawberryFragaria7.68000000000000136USA4594539falseBlackberryRubus9.61180MexicoundefinedfalseRaspberryRubus11.9400000000000220Russia543421falseGrapeVitis18.1000000000000288China58500118true


To sort the DataFrame by descending energy value:

sort( DF, Energy, `>` );

GenusCarbohydratesEnergyCountryTotal TonsBotanical BerryGrapeVitis18.1000000000000288China58500118trueRaspberryRubus11.9400000000000220Russia543421falseBlackberryRubus9.61180MexicoundefinedfalseStrawberryFragaria7.68000000000000136USA4594539false


It is also possible to sort using a DataSeries containing strings:

sort( DF, Country );

GenusCarbohydratesEnergyCountryTotal TonsBotanical BerryGrapeVitis18.1000000000000288China58500118trueBlackberryRubus9.61180MexicoundefinedfalseRaspberryRubus11.9400000000000220Russia543421falseStrawberryFragaria7.68000000000000136USA4594539false


Filtering data in a DataFrame

A very useful aspect of data frames is that they can be queried for subsets of the DataFrame that match a given query. Queries on data frames return a truth table whose entries are either true, false or FAIL depending on if a given element meets a given criteria.

For example, to return all of the berries that have greater than 10 g of carbohydrates per 100g:

DF[ Carbohydrates ] >~ 10;



The truth table is a DataSeries that can be used to index the main DataFrame. If the DataFrame is indexed by a DataSeries of type truefalse, it returns a filtered DataFrame:

DF[ DF[ Carbohydrates ] >~ 10 ];

GenusCarbohydratesEnergyCountryTotal TonsBotanical BerryGrapeVitis18.1000000000000288China58500118trueRaspberryRubus11.9400000000000220Russia543421false


It is possible to use the add command to tally the values of any column of a DataFrame, including the truth table generated using the above query:

add( DF[ Carbohydrates ] >~ 10 );



This means that for the 4 individuals in the DataFrame, 2 out of 4 have greater then 10 g of carbohydrates per 100 g and 2 out of 4 have less than or equal to 10g of carbohydrates.


More advanced queries can be created by combining several queries into one statement using logical operators such as and, or, etc.

For example, which of the berries have less than 10 g of carbohydrates and greater than 150 kJ of energy?

DF[ DF[ Carbohydrates ] <~ 10 and DF[ Energy ] >~ 150 ];

GenusCarbohydratesEnergyCountryTotal TonsBotanical BerryBlackberryRubus9.61180Mexicoundefinedfalse


For more examples, see the Subsets of DataFrames example page.

What does with do?

The with command creates newly assigned variables that correspond to each of the column names.

with( DF );

Genus&comma;Carbohydrates&comma;Energy&comma;Country&comma;Total Tons&comma;Botanical Berry


These variables are each a DataSeries that has the same row labels as the parent DataFrame.




Binding labels is useful for simplifying the query syntax:

Energy >~ 200;



The unwith command unassigns the column names:

unwith( DF );




Reshaping data frames

The Transpose command returns the transpose of a DataFrame. This turns the variable columns into individual rows and the individual rows into variable columns.

Transpose( DF );

BlackberryGrapeRaspberryStrawberryGenusRubusVitisRubusFragariaCarbohydrates9.6118.100000000000011.94000000000007.68000000000000Energy180288220136CountryMexicoChinaRussiaUSATotal Tonsundefined585001185434214594539Botanical Berryfalsetruefalsefalse


Applying a function to columns of a DataFrame

Many top-level commands as well as commands in the Statistics package can be applied directly to data frames, however many commands do not work with data frames. For many functions, there simply is not a uniform application that can be sensibly applied to all the DataSeries in a DataFrame. In cases where one wants to apply a function to only a single DataSeries of a DataFrame, the ~ (element-wise) operator can be useful.

For example, say we wanted to round the values in the Carbohydrates column to the nearest integer value, the round command can be applied in the following way:




To change the values for this column in place, reassign the Carbohydrates column:

DF[Carbohydrates] := round~(DF[Carbohydrates]);



It may also be useful to apply a custom function to a DataSeries. For example, the following converts the values in the Energy column to Joules from KJoules.

(x -> x * 1000)~(DF[Energy]);



The following converts all the country names to strings:

(x -> convert( x, string) )~(DF[Country]);



Changing the datatype of a column

The datatypes option of the DataFrame constructor specifies the datatypes for each column of the DataFrame. The SubsDatatype command changes the value of the specified datatype and attempts to coerce the data contained in the column to that new datatype.

Datatypes( DF );



SubsDatatype( DF, Energy, float );

GenusCarbohydratesEnergyCountryTotal TonsBotanical BerryBlackberryRubus10180.MexicoundefinedfalseGrapeVitis18288.China58500118trueRaspberryRubus12220.Russia543421falseStrawberryFragaria8136.USA4594539false


It is important to note that this is not done in-place; in order to have a permanent change, reassignment is required.

DF := SubsDatatype( DF, Energy, float );

DFGenusCarbohydratesEnergyCountryTotal TonsBotanical BerryBlackberryRubus10180.MexicoundefinedfalseGrapeVitis18288.China58500118trueRaspberryRubus12220.Russia543421falseStrawberryFragaria8136.USA4594539false


Datatypes( DF );



Converting a DataFrame to other data types

It may be beneficial from time to time to convert all columns or some columns of a DataFrame to other data types in order to interact with other commands in the Maple language. The convert command can convert a DataFrame to a Matrix, table, Array or a nested list (by supplying the option nested to a conversion to list).

For example, let us convert the numeric columns of the DataFrame to a Matrix:

convert( DF[ [ Carbohydrates, Energy, `Total Tons` ] ], Matrix );



The following converts the columns with strings or names data into a nested list:

convert( DF[ [ Genus, Country ] ], list, nested );



Data frames can also be converted to Arrays:

convert( DF[ `Botanical Berry` ], Array );



In most of these cases, the conversion is lossy; the column names and row names are discarded. The only case that attempts to keep the meta-data is the conversion to table, where each table index is the row name for the entry value:

DFtable := convert( DF[ Genus ], table );



indices( DFtable );



entries( DFtable );



Creating an empty DataFrame

In order to create an empty DataFrame, supply an empty list of lists as the first argument:

NewDF := DataFrame( [[]], columns = [Column1] );



Observations can then be added to the DataFrame using the Append command. For more details, see the Adding a new column or row section.

NewDF := Append( NewDF, DataFrame( <<1>>, columns = [Column1] ) );



NewDF := Append( NewDF, [2], label = Column2 );



Importing data into a DataFrame

By default, the Import command returns data frames when importing many tabular file formats such as .csv, .tsv, .xls, or .xlsx files.

PassengerData:=Import( "datasets/air_passengers.csv", base=datadir );


Plotting data from a DataFrame

You can use the dataplot command or numerous Statistics visualization commands to plot a DataFrame.



For additional visualization examples, see the Iris Data and Statistics with DataFrames example worksheets.

More examples

There are several examples for working with DataFrames and DataSeries:


Indexing a DataFrame : Examples of selecting, rearranging, and reassigning entries from a DataFrame object


Iris Data : Examples of summarizing data, computing aggregate statistics, and principal component analysis


Subsets of DataFrames : Examples of indexing and filtering columns and rows of a DataFrame


Statistics with DataFrames : Examples of using commands from Statistics on DataFrames