Package 'lpda'

Title:	Linear Programming Discriminant Analysis
Description:	Classification method obtained through linear programming. It is advantageous with respect to the classical developments when the distribution of the variables involved is unknown or when the number of variables is much greater than the number of individuals. LPDA method is published in Nueda, et al. (2022) "LPDA: A new classification method based on linear programming". <doi:10.1371/journal.pone.0270403>.
Authors:	Maria Jose Nueda <[email protected]>
Maintainer:	Maria Jose Nueda <[email protected]>
License:	GPL (>= 2)
Version:	1.0.1
Built:	2025-02-18 04:26:35 UTC
Source:	https://github.com/cran/lpda

Help Index

Choosing the best number of Principal Components (PCs) for lpda-pca model.
Choosing the best explained variability for lpda-pca model.
CVktest evaluates the error rate classification with crossvalidation
CVloo evaluates the error rate classification with leave one out procedure
Computing discriminating hyperplane for two groups
lpda.fit computes the discriminating hyperplane for two groups
lpda.pca computes a PCA to the original data and selects the desired PCs when Variability is supplied
lpdaCV evaluates the error rate classification with a crossvalidation procedure
Spectrometry and composition chemical of Spanish and Arabian palm dates
Principal Component Analysis
Plot method for lpda classification
Predict method for lpda classification
Simulated RNA-Seq dataset example
stand center and scale a data matrix
stand2 center and scale a data matrix with the parameters of another one

Choosing the best number of Principal Components (PCs) for lpda-pca model.

Description

bestPC computes the classification error for lpda.pca models applied with the number of components specified in PCs argument. The result is the average classification error rate from the R models computed for each number of PCs.

Usage

bestPC(data, group, ntest = 10, R = 10, PCs = c(10,15,20), f1 = NULL, f2 = NULL)
bestPC(data, group, ntest = 10, R = 10, PCs = c(10,15,20), f1 = NULL, f2 = NULL)

Arguments

`data`	Matrix containing data. Individuals in rows and variables in columns
`group`	Vector with the variable group
`ntest`	Number of samples to evaluate in the test-set.
`R`	Times the model is evaluated with each Variability indicated in Vars vector.
`PCs`	The PCs to check.
`f1`	Vector with weights for individuals of the first group. If NULL they are equally weighted.
`f2`	Vector with weights for individuals of the second group. If NULL they are equally weighted.

Value

bestPC returns a vector with the average prediction error rate obtained from the R models for each PC specified in PCs input.

Author(s)

Maria Jose Nueda, [email protected]

Examples

  data(RNAseq)
  group = as.factor(rep(c("G1","G2"), each = 30))
  bestPC(RNAseq, group, ntest = 10, R = 5, PCs = c(2, 10))
  data(RNAseq)
  group = as.factor(rep(c("G1","G2"), each = 30))
  bestPC(RNAseq, group, ntest = 10, R = 5, PCs = c(2, 10))

Choosing the best explained variability for lpda-pca model.

Description

bestVariability computes the classification error for lpda.pca models obtained with the number of components needed to reach the explained variability specified in 'Vars' argument. The result is the average classification error rate from the R models computed for each explained variability.

Usage

bestVariability(data, group, ntest = 10, R = 10, Vars = c(0.5,0.7), f1 = NULL, f2 = NULL)
bestVariability(data, group, ntest = 10, R = 10, Vars = c(0.5,0.7), f1 = NULL, f2 = NULL)

Arguments

`data`	Matrix containing data. Individuals in rows and variables in columns
`group`	Vector with the variable group
`ntest`	Number of samples to evaluate in the test-set.
`R`	Times the model is evaluated with each Variability indicated in Vars vector.
`Vars`	The different variabilities to check from which the best variability parameter will be chosen for lpdapca model.
`f1`	Vector with weights for individuals of the first group. If NULL they are equally weighted.
`f2`	Vector with weights for individuals of the second group. If NULL they are equally weighted.

Value

bestVar returns a vector with the average prediction error rate obtained from the R models for each variability specified in Vars input.

Author(s)

Maria Jose Nueda, [email protected]

Examples

  data(RNAseq)
  group = as.factor(rep(c("G1","G2"),each=30))
  bestVariability(RNAseq, group, ntest = 10, R = 5, Vars = c(0.1,0.9))
  data(RNAseq)
  group = as.factor(rep(c("G1","G2"),each=30))
  bestVariability(RNAseq, group, ntest = 10, R = 5, Vars = c(0.1,0.9))

CVktest evaluates the error rate classification with crossvalidation

Description

CVktest evaluates the error rate classification in k samples that do not participate in the model

Usage

CVktest(data, group, scale = FALSE, pca = FALSE, PC = 2,
                    Variability = NULL, ntest = 10, R = 10, f1 = NULL, f2 = NULL)
CVktest(data, group, scale = FALSE, pca = FALSE, PC = 2,
                    Variability = NULL, ntest = 10, R = 10, f1 = NULL, f2 = NULL)

Arguments

`data`	Matrix containing data. Individuals in rows and variables in columns
`group`	Vector with the variable group
`scale`	Logical indicating if data is standarised.
`pca`	Logical indicating if a reduction of dimension is required
`PC`	Number of Principal Components (PC) for PCA. By default it is 2. When the number of PC is not decided, it can be determined choosing the desired proportion of explained variability (Variability parameter) or choosing the maximum number of errors allowed in the training set (Error.max).
`Variability`	Parameter for Principal Components (PC) selection. This is the desired proportion of variability explained for the PC of the variables.
`ntest`	Number of samples to evaluate in the test-set.
`R`	Number of times that the error is evaluated.
`f1`	Vector with weights for individuals of the first group. If NULL they are equally weighted.
`f2`	Vector with weights for individuals of the second group. If NULL they are equally weighted.

Value

lpdaktest The prediction error rate.

Author(s)

Maria Jose Nueda, [email protected]

CVloo evaluates the error rate classification with leave one out procedure

Description

CVloo evaluates the error rate classification with leave one out procedure.

Usage

CVloo(data, group, scale = FALSE, pca = FALSE, PC = 2,
                  Variability = NULL, f1 = NULL, f2 = NULL)
CVloo(data, group, scale = FALSE, pca = FALSE, PC = 2,
                  Variability = NULL, f1 = NULL, f2 = NULL)

Arguments

`data`	Matrix containing data. Individuals in rows and variables in columns.
`group`	Vector with the variable group.
`scale`	Logical indicating if data is standarised.
`pca`	Logical indicating if a reduction of dimension is required.
`PC`	Number of Principal Components (PC) for PCA. By default it is 2. When the number of PC is not decided, it can be determined choosing the desired proportion of explained variability (Variability parameter) or choosing the maximum number of errors allowed in the training set (Error.max).
`Variability`	Parameter for Principal Components (PC) selection. This is the desired proportion of variability explained for the PC of the variables.
`f1`	Vector with weights for individuals of the first group. If NULL they are equally weighted.
`f2`	Vector with weights for individuals of the second group. If NULL they are equally weighted.

Value

CVloo The prediction error rate.

Author(s)

Maria Jose Nueda, [email protected]

Computing discriminating hyperplane for two groups

Description

This function computes a discriminating hyperplane for two groups with original data (calling lpda.fit) or with principal components (calling lpda.pca)

Usage

lpda(data, group, scale = FALSE, pca = FALSE, PC = 2, Variability = NULL,
                 f1 = NULL, f2 = NULL)
lpda(data, group, scale = FALSE, pca = FALSE, PC = 2, Variability = NULL,
                 f1 = NULL, f2 = NULL)

Arguments

`data`	Matrix containing data. Individuals in rows and variables in columns
`group`	Vector with the variable group
`scale`	Logical indicating if data is standarised. When pca=TRUE data is always scaled.
`pca`	Logical indicating if Principal Components Analysis is required
`PC`	Number of Principal Components (PC) for PCA. By default it is 2. When the number of PC is not decided, it can be determined choosing the desired proportion of explained variability (Variability parameter).
`Variability`	Parameter for Principal Components (PC) selection. This is the minimum desired proportion of variability explained for the PC of the variables. The analysis is always done with a minimum of 2 PCs. If it is NULL the PCA will be computed with PC parameter.
`f1`	Vector with weights for individuals of the first group. If NULL they are equally weighted.
`f2`	Vector with weights for individuals of the second group. If NULL they are equally weighted.

Value

lpda returns an object of class "lpda".

The functionspredict and plot can be used to obtain the predicted classes and a plot in two dimensions with the distances to the computed hyperplane for the two classes.

`coef`	Hyperplane coefficients
`data`	Input data matrix when pca = FALSE and scores when pca = TRUE
`group`	Input group vector
`scale`	Input scale argument
`pca`	Input pca argument
`loadings`	Principal Components loadings. Showed when pca = TRUE
`scores`	Principal Components scores. Showed when pca = TRUE
`var.exp`	A matrix containing the explained variance for each component and the cumulative variance. Showed when pca = TRUE
`PCs`	Number of Principal Components in the analysis. Showed when pca = TRUE

Author(s)

Maria Jose Nueda, [email protected]

References

Nueda MJ, Gandía C, Molina MD (2022) LPDA: A new classification method based on linear programming. PLoS ONE 17(7): e0270403. <https://doi.org/10.1371/journal.pone.0270403>

Examples


######### palmdates example in lpda package:
data(palmdates)
group = as.factor( c(rep("Spanish",11),rep("Foreign",10)) )

# with concentration data:
model = lpda(data = palmdates$conc, group = group )
pred = predict(model)
table(pred$fitted, group)
plot(model, main = "Palmdates example")

model.pca = lpda(data = palmdates$conc, group = group, pca=TRUE, PC = 2)
plot(model.pca, PCscores = TRUE, main = "Palmdates example")

# with spectra data
model.pca = lpda(data = palmdates$spectra, group = group, pca=TRUE, Variability = 0.9)
model.pca$PCs # 4 PCs to explain 90% of the variability
plot(model.pca, PCscores = TRUE, main = "Spectra palmdates")

######### palmdates example in lpda package:
data(palmdates)
group = as.factor( c(rep("Spanish",11),rep("Foreign",10)) )

# with concentration data:
model = lpda(data = palmdates$conc, group = group )
pred = predict(model)
table(pred$fitted, group)
plot(model, main = "Palmdates example")

model.pca = lpda(data = palmdates$conc, group = group, pca=TRUE, PC = 2)
plot(model.pca, PCscores = TRUE, main = "Palmdates example")

# with spectra data
model.pca = lpda(data = palmdates$spectra, group = group, pca=TRUE, Variability = 0.9)
model.pca$PCs # 4 PCs to explain 90% of the variability
plot(model.pca, PCscores = TRUE, main = "Spectra palmdates")

lpda.fit computes the discriminating hyperplane for two groups

Description

lpda.fit computes the discriminating hyperplane for two groups, giving as a result the coefficients of the hyperplane.

Usage

lpda.fit(data, group, f1 = NULL, f2 = NULL)
lpda.fit(data, group, f1 = NULL, f2 = NULL)

Arguments

`data`	Matrix containing data. Individuals in rows and variables in columns
`group`	Vector with the variable group
`f1`	Vector with weights for individuals of the first group
`f2`	Vector with weights for individuals of the second group

Value

coef

Hyperplane coefficients

Author(s)

Maria Jose Nueda, [email protected]

References

Nueda MJ, Gandía C, Molina MD (2022) LPDA: A new classification method based on linear programming. PLoS ONE 17(7): e0270403. <https://doi.org/10.1371/journal.pone.0270403>

lpda.pca computes a PCA to the original data and selects the desired PCs when Variability is supplied

Description

lpda.pca computes the discriminating hyperplane for two groups with Principal Components (PC)

Usage

lpda.pca(data, group, PC = 2, Variability = NULL)
lpda.pca(data, group, PC = 2, Variability = NULL)

Arguments

`data`	Matrix containing data. Individuals in rows and variables in columns
`group`	Vector with the variable group
`PC`	Number of Principal Components (PC) for PCA. By default it is 2. When the number of PC is not decided, it can be determined choosing the desired proportion of explained variability (Variability parameter).
`Variability`	Parameter for Principal Components (PC) selection. This is the minimum desired proportion of variability explained for the PC of the variables. The analysis is always done with a minimum of 2 PCs. If it is NULL the PCA will be computed with PC parameter.

Value

`loadings`	Principal Components loadings.
`scores`	Principal Components scores.
`var.exp`	A matrix containing the explained variance for each component and the cumulative variance.
`PCs`	Number of Principal Components in the analysis.

Author(s)

Maria Jose Nueda, [email protected]

References

Nueda MJ, Gandía C, Molina MD (2022) LPDA: A new classification method based on linear programming. PLoS ONE 17(7): e0270403. <https://doi.org/10.1371/journal.pone.0270403>

lpdaCV evaluates the error rate classification with a crossvalidation procedure

Description

lpdaCV evaluates the error rate classification with a crossvalidation procedure

Usage

lpdaCV(data, group, scale = FALSE, pca = FALSE, PC = 2, Variability = NULL,
                   CV = "loo", ntest = 10, R = 10, f1 = NULL, f2 = NULL)
lpdaCV(data, group, scale = FALSE, pca = FALSE, PC = 2, Variability = NULL,
                   CV = "loo", ntest = 10, R = 10, f1 = NULL, f2 = NULL)

Arguments

`data`	Matrix containing data. Individuals in rows and variables in columns
`group`	Vector with the variable group
`scale`	Logical indicating if data is standarised.
`pca`	Logical indicating if a reduction of dimension is required
`PC`	Number of Principal Components (PC) for PCA. By default it is 2. When the number of PC is not decided, it can be determined choosing the desired proportion of explained variability (Variability parameter) or choosing the maximum number of errors allowed in the training set (Error.max).
`Variability`	Parameter for Principal Components (PC) selection. This is the desired proportion of variability explained for the PC of the variables.
`CV`	Crossvalidation mode: loo "leave one out" or ktest: that leaves k in the test set.
`ntest`	Number of samples to evaluate in the test-set.
`R`	Number of times that the error is evaluated.
`f1`	Vector with weights for individuals of the first group. If NULL they are equally weighted.
`f2`	Vector with weights for individuals of the second group. If NULL they are equally weighted.

Value

lpdaCV The prediction error rate.

Author(s)

Maria Jose Nueda, [email protected]

Examples


  data(RNAseq)
  group = as.factor(rep(c("G1","G2"), each = 30))
  lpdaCV(RNAseq, group, pca = TRUE, CV = "ktest", ntest = 2)

  data(RNAseq)
  group = as.factor(rep(c("G1","G2"), each = 30))
  lpdaCV(RNAseq, group, pca = TRUE, CV = "ktest", ntest = 2)

Spectrometry and composition chemical of Spanish and Arabian palm dates

Description

A data set with scores of 21 dates on spectrometry and concentration measurements of the substances that better define the quality of the dates: fibre, sorbitol, fructose, glucose and myo-inositol. The first 11 dates are Spanish (from Elche, Alicante) and the last 10 are from other countries, mainly Arabian.

Usage

palmdatespalmdates

Format

A data frame with 2 elements:

conc: a data frame with 5 columns: fibre, sorbitol, fructose, glucose and myo-inositol.
spectra: a data frame with 2050 columns.

Author(s)

Maria Jose Nueda, [email protected]

References

Abdrabo, S.S., Gras, L., Grindlay, G. and Mora, J. (2021) Evaluation of Fourier Transform-Raman Spectroscopy for palm dates characterization. Journal of food composition and analysis. Submitted.

Principal Component Analysis

Description

Computes a Principal Component Analysis when both when p>n and when p<=n.

Usage

PCA(X)
PCA(X)

Arguments

`X`	Matrix or data.frame with variables in columns and observations in rows.

Value

`eigen`	A eigen class object with eigenvalues and eigenvectors of the analysis.
`var.exp`	A matrix containing the explained variance for each component and the cumulative variance.
`scores`	Scores of the PCA analysis.
`loadings`	Loadings of the PCA analysis.

Author(s)

Maria Jose Nueda, [email protected]

Examples

## Simulate data matrix with 500 variables and 10 observations
datasim = matrix(sample(0:100, 5000, replace = TRUE), nrow = 10)
## PCA
myPCA = PCA(datasim)
## Extracting the variance explained by each principal component
myPCA$var.exp
## Simulate data matrix with 500 variables and 10 observations
datasim = matrix(sample(0:100, 5000, replace = TRUE), nrow = 10)
## PCA
myPCA = PCA(datasim)
## Extracting the variance explained by each principal component
myPCA$var.exp

Plot method for lpda classification

Description

plot.lpda is applied to an lpda class object. It shows a plot in two dimensions with the distances to the computed hyperplane of each individual coloring each case with the real class.

Usage

## S3 method for class 'lpda'
plot(x, PCscores = FALSE, xlim = NULL, main = NULL,
legend.pos = "topright", ...)
## S3 method for class 'lpda'
plot(x, PCscores = FALSE, xlim = NULL, main = NULL,
legend.pos = "topright", ...)

Arguments

`x`	Object of class inheriting from "lpda"
`PCscores`	Logical to show the first 2 PCscores. Only possible when PCA is applied.
`xlim`	An optional vector with two values with the x-axis range. If omitted, it will be computed.
`main`	An optional title for the plot.
`legend.pos`	The position for the legend. By default it is topright. NULL when no legend is required.
`...`	Other arguments passed.

Value

Two dimensinal plot representing the distances to the computed hyperplane of each individual colored with the real class.

Author(s)

Maria Jose Nueda, [email protected]

Predict method for lpda classification

Description

Predict method for lpda classification

Usage

## S3 method for class 'lpda'
predict(object, datatest = object$data,...)
## S3 method for class 'lpda'
predict(object, datatest = object$data,...)

Arguments

`object`	Object of class inheriting from "lpda"
`datatest`	An optional data to predict their class. If omitted, the original data is used.
`...`	Other arguments passed.

Value

`fitted`	Predicted class
`eval`	Evaluation of each individual in the fitted model

Author(s)

Maria Jose Nueda, [email protected]

Simulated RNA-Seq dataset example

Description

A simulated RNA-Seq dataset example.

Usage

RNAseqRNAseq

Format

A data frame with 600 variables (in columns) and 60 samples (rows).

Details

This dataset is a RNA-Seq simulated example. It has been simulated as Negative Binomial distributed and transformed to rpkm (Reads per kilo base per million mapped reads). It contains 600 genes (in columns) and 60 samples (rows), 30 of each one of the experimental groups. First 30 samples are from first group and the remaining samples from the second one.

Author(s)

Maria Jose Nueda, [email protected]

stand center and scale a data matrix

Description

stand center and scale a data matrix

Usage

stand(X)
stand(X)

Arguments

`X`	a data matrix with individuals in rows and variables in columns

Value

Scaled data matrix

stand2 center and scale a data matrix with the parameters of another one

Description

stand2 center and scale a data matrix with the parameters of another one

Usage

stand2(X, X2)
stand2(X, X2)

Arguments

`X`	the data matrix from which mean and standard deviation is computed
`X2`	the data matrix to center and scale

Value

Scaled X2 data matrix

Package 'lpda'

Help Index

Choosing the best number of Principal Components (PCs) for lpda-pca model.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Choosing the best explained variability for lpda-pca model.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

CVktest evaluates the error rate classification with crossvalidation

Description

Usage

Arguments

Value

Author(s)

See Also

CVloo evaluates the error rate classification with leave one out procedure

Description

Usage

Arguments

Value

Author(s)

See Also

Computing discriminating hyperplane for two groups

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

lpda.fit computes the discriminating hyperplane for two groups

Description

Usage

Arguments

Value

Author(s)

References

See Also

lpda.pca computes a PCA to the original data and selects the desired PCs when Variability is supplied

Description

Usage

Arguments

Value

Author(s)

References

See Also

lpdaCV evaluates the error rate classification with a crossvalidation procedure

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Spectrometry and composition chemical of Spanish and Arabian palm dates

Description

Usage

Format

Author(s)

References

Principal Component Analysis

Description

Usage

Arguments

Value

Author(s)

Examples

Plot method for lpda classification

Description