Frequently Asked Questions

On this page, I answer FAQs that many scientists have asked me. If you have a question that isn’t answered here, please feel free to email me or send me a Teams message!

This document is a work in progress! Last edited 2025-02-06.

How can I ask you for help?
What information should I provide in a help request?
How should I format my data to share with you?
What kinds of statistical analyses can you help with?
What does the help you provide look like?
Do you have any favorite methods or approaches?
What statistical programming languages do you use?
Can you help me with SciNet?
How fast can you get something back to me?
How do I give you credit for assisting with my analysis?
Can you help me make my science open and reproducible?
What about reviewing my five-year project plan?
How can I learn to do my own stats?
How can I schedule a stats training workshop at my location?

How can I ask you for help?

Please contact me by email (quentin dot read at usda dot gov) or Microsoft Teams message. I will do my best to get back to you as soon as possible!

What information should I provide in a help request?

I can provide you a better answer if I know what the goals of your research are, and if I know what your data look like. I would greatly appreciate it if your help request would include at least a few sentences describing the research, including the goal of the research and what specific research questions you are exploring/hypotheses you are testing. Also, if you can provide at least a sample of your raw data so that I can see what format it is in and what kind of variables we will be working with, that’s very helpful too. If you have anything like a field map or spreadsheet of treatment assignments that helps clarify the experimental design, that’s also helpful for me to look at.

With all of that said, don’t worry too much about providing every single possible piece of information. But if I have the info I need to help you ahead of time, it can make our consultation meetings much more efficient and productive!

It makes it a lot more efficient for both of us if you can provide data in a format that is ready to be imported into statistical software like R or SAS. That maximizes the amount of time I can spend helping you with data analysis, visualization, modeling, and storytelling. You know your data better than I do, so if you are the one who takes the lead in cleaning and formatting the data, there is less potential for error.

I would prefer to have data in an analysis-ready format, which means:

Data in CSV or TXT format instead of XLSX. This is easier to read into statistical software and doesn’t depend on a proprietary Microsoft file format. Also, it guarantees that all information is provided in plain text instead of through formatting like highlighting cells.
Data are in row and column format. Each row is an observation and each column is a variable. Each column should contain only one data type (numeric or text).
There should be only one header row, no gaps between headers and data, and identifier values should be filled in down all the columns.
There should not be any additional rows or columns with summary statistics pre-calculated.

See this excellent guide to sharing data with a statistician for more details on the best format for sharing data.

What kinds of statistical analyses can you help with?

Power analysis and experimental design: The old adage is that the best time to consult a statistician is before you start the experiment. I certainly agree, and I am happy to help you determine whether your experimental design is sound and what kind of sample size you will need.
Data science (visualization, data cleaning): I can help you turn your raw data into a format that you can feed into a statistical model, or help you make graphs and tables to explore patterns in your data.
Analyzing experimental and observational data: This is my bread and butter. I can help you with many different kinds of statistical models from all kinds of studies, whether it is a designed experiment or observational dataset. This ranges from a simple ANOVA design to complex generalized linear mixed models with weird error structures and response distributions.
Geospatial data/statistics: I can help you make maps of your spatial data and fit statistical models that take the spatial pattern in the data into account.
Time series: Though I’m not an expert on time series, I’m learning more every day. I can help you with repeated measures models, forecasting techniques like ARIMA, and GAM models.
Machine learning: Again, I am not an expert but I am constantly trying to learn new machine learning techniques. I can help you with models like random forest and other multivariate clustering techniques.
And more …: Feel free to ask me about any kind of model or analysis. If I can’t help you I will try my best to find people or resources that can give you the help you need.

What does the help you provide look like?

It ranges the whole gamut from a quick email or 10-minute conversation, to a collaboration that can last for months or years. I can answer questions you have, help point you toward resources that can help you learn about the stats or models you need, or review code or text you’ve written to make sure it’s correct. If needed, I can do some analysis for you, or even take the lead on the entire data manipulation, analysis, and presentation workflow. It really depends on your needs. Every project is different! But no matter what, it is a “co-creation” process where we will work together to use your data to tell the story you want to tell.

Do you have any favorite methods or approaches?

I am a big proponent of Bayesian methods. They are more flexible and allow us to fit models that classical statistical approaches just can’t handle. Also, philosophically it’s a better way to approach science: classical statistics tries to reject or not reject a null hypothesis, which gives the false impression that the world is black and white and there are “yes or no” answers to our hypotheses about the world. Bayesian statistics is more about estimating the size of the effects and being honest about the level of uncertainty we have for any claim we make about the world. Of course, I know many people haven’t been trained in that approach, so I am happy to work with you to learn more about it. Even if you don’t become a Bayesian, it’s important to at least be familiar with the terminology and the ideas behind it because you will start to see it more and more in the literature as time goes on.

Whether we’re working with Bayesian or classical models, I really like GLMMs (generalized linear mixed models). They are a very flexible kind of model that allow us to work with data with all kinds of non-independence in space and time, and all kinds of distributions.

Bayesian stats and GLMMs are best for “small data” or medium-sized data. When it comes to big data, we have to move to machine learning approaches. As I said above, I am not an expert in those but I am excited to learn with you!

What statistical programming languages do you use?

I primarily use R. If I do an analysis “from scratch” for a scientist, I usually do the analysis in R and write it up as an RMarkdown notebook. That’s a document that includes code, output, figures, and explanatory text all in one place. I find that this is the best way to share my work with scientists. What R packages do I usually use? I do most data manipulation using tidyverse but also use data.table for larger datasets. I use modeling packages like glmmTMB and lme4 for classical statistical analyses, and Stan software coupled with the R packages brms and tidybayes for Bayesian analyses. emmeans and easystats are great packages for supporting all kinds of analyses. For geospatial data stuff, I use the sf package in R as well as occasionally using GDAL and GEOS on the command line.

I am also somewhat experienced with SAS and capable of helping you with your SAS code, as well as Python to a lesser extent. I can also help you with your JMP analysis.

Can you help me with SciNet?

Yes, I have some experience using SciNet and other high-performance computing clusters, and I can probably provide you some help. But for more involved requests, I’d recommend getting in touch with folks from GBRU, asking your question on the SciNet forums, or contacting the excellent support staff at VRSC directly.

As of 2024, I am a member of the SciNet advisory committee, serving as a liaison between the scientists that use SciNet and the people that run the system and provide trainings. So if you have any requests for training on SciNet, feel free to get in touch with me and I can pass that info on to the SciNet office. Or check out the user guides on the SciNet page!

How fast can you get something back to me?

I have a lot of ongoing commitments to help out scientists at any given time. I work on a first-come, first-served basis. But I do want to make regular progress on all the projects. So I cycle through all my currently active projects and work on each one for a chunk of time. Currently I’m able to work on each project roughly every 1-2 weeks. Ideally, I would make enough progress each time to send you an update. But typically I will only be able to devote a small percent of time to a specific project in any one week. Feel free to email me at any time with questions or clarifications; again I’ll address those on a first-come first-served basis as they come in.

Of course, I am willing to make exceptions if there is a rush deadline. But the sooner you can let me know, the better, so that I can plan accordingly.

How do I give you credit for assisting with my analysis?

I do not have a formal publication quota, but I am informally evaluated in part based on the publications and presentations I co-author. Of course, my contribution will vary a lot from project to project. Please consider adding me as a co-author on any paper or other product where I’ve made a meaningful contribution to the analysis, presentation, and/or writing. This is a good idea because it makes me formally accountable for the analysis I did or helped you do. If I am a co-author, I promise to hold up my end of the bargain and write any sections for which I am responsible, including creating figures and tables. Like any good co-author, I will review and give comments on the entire text of the manuscript. I’ll pay special attention to making sure statements in the abstract and discussion are supported by the analysis results. But if my contribution to your project is just a quick consultation or question-and-answer session, co-authorship is not necessary. An informal acknowledgment would be great!

Can you help me make my science open and reproducible?

Yes! I am passionate about promoting open and reproducible science in ARS. It’s especially important now that the White House has mandated we make all our data publicly available. That should also include the code that turns raw data into a final product with the results of an analysis. What does this look like in practice?

I encourage the use of GitHub for keeping a master version of your data processing and statistical analysis code. This is a great idea both for when you’re working by yourself to keep track of what you’re working on and to avoid reinventing the wheel with each new project, and when working with others to share code and work on the same code together. I will help you create a private GitHub repository where we can share code and collaborate on our project. ARS has a GitHub Enterprise account that we can use for this.

2. Publishing data on a repository

When it is time to publish a manuscript, I will help you archive the data associated with the publication, and ideally also the analysis code, on a public repository. Usually, unless you are dealing with specialized data types such as sequence data, general-purpose small datasets should be published on Ag Data Commons, the USDA’s own data repository. Publishing the data will ensure that the code and data we produce at USDA provide the biggest possible benefit to society. Here are the steps that I usually follow along with my ARS scientist collaborators. Ideally the scientist would take the lead in this process with my support, but in some cases I do the bulk of these steps myself. Notice that a separate ARS-115 form is submitted for the dataset in addition to the one for the manuscript or presentation. That’s a good thing because it helps ensure we get credit for the additional work and time it takes to publish data and make our science open and reproducible!

Put together CSV files of the data, notebook of the analysis code including documentation, and Readme files explaining what all the columns in all the data files represent.
Create a new dataset on Ag Data Commons and upload the files.
Fill out all metadata fields on Ag Data Commons. (At this point you may reserve a DOI that you can put in the manuscript even if the data have not yet been published to that DOI.)
When the scientist submits the manuscript to the journal, they include a placeholder data availability statement at the end of the materials and methods section, such as “All data and code required to reproduce the analyses presented here are archived on Ag Data Commons (DOI #####).”
Submit a separate ARS-115 form for the dataset. (This step and the following steps can be done when the MS is already in revision or accepted at the journal).
Once the dataset’s ARS-115 form is approved, submit the Ag Data Commons repository for curation. A data curator at the National Ag Library will review it and give comments. Usually this only takes a few days and the comments usually don’t take that much work to address.
After the curator’s feedback is addressed, the curator publishes the dataset on Ag Data Commons. The dataset gets an official DOI which can be cited.
When the lead author gets a chance to do the final revision of the MS (this can be post acceptance at the page proofs stage), if you didn’t reserve a DOI previously, add the DOI in place of the “#####” in the placeholder statement. The MS may also include a formal citation to the dataset.

What about reviewing my five-year project plan?

I am officially responsible for reviewing all 5-year CRIS project plans for the Southeast Area. The program analysts send me the preplans for each review cycle and I provide comments and feedback, primarily focusing on the experimental design, proposed statistical analyses, and data analysis/management parts of the plans. But if you want to get a head start on the process, I can help at an earlier stage if you send me questions about specific elements of your plan such as experimental design or power calculations.

The reason I review the plans is not only to help you get a better score with the OSQR panel. My suggestions are non-binding and you have no obligation to modify the text of the preplan based on my input. But even if you don’t make any changes to the text, it might make you aware of issues around experimental design or statistical power that are important to address before you start collecting data! Of course, I will be willing to help you out at any stage of the experimental design, data collection, and statistical analysis process.

Incidentally, it isn’t necessary to list me as a collaborator on your preplan. I am always available to provide statistical support to SEA scientists, whether or not my name appears on your plan.

Quentin, it’s great that you’re there to help, but I see there is only one of you for hundreds of scientists. How can I learn to do my own stats?

Learning stats is a journey and a process. You can’t learn it overnight. However, I would recommend starting at my SEAStats training page for a gentle introduction to both the statistical models and the tools in R you will need to work with them. On that page I also have links to other helpful tutorials and learning resources. Also, check out the free online training page on SciNet that my area statistician colleagues Sara and Kathy put together with tons of resources!

How can I schedule a stats training workshop at my location?

If you would like me to teach a workshop on a topic related to statistics, data science, or statistical programming, I am available for either in-person workshops at SEA locations or virtually via Teams/Zoom.

There are three ways I can do a workshop: teach a lesson/short presentation from my lesson page, teach from someone else’s tutorial or learning resource, or do a “bespoke” lesson on a topic of your choice.

1. Lessons and presentations on the SEAStats page

I can teach one or more of the lessons that are already on the SEAStats training page. Here is a rundown of what’s currently available there. For the “multi-part” lessons, we can do all parts or only a subset depending on how much time is available. You can also see a list of talks and presentations on the SEAStats page.

Lesson	Length
R Boot Camp: the basics of R programming and working with dataframes	2 lessons, 90 minutes each
Mixed Models in R: linear mixed models in R, including simple GLMMs and emmeans	4 lessons, 90 minutes each
ggplot2 Basics: a brief introduction to the ggplot2 plotting package in R	1 lesson, 90 minutes
Bayesian Mixed Models with brms: introduction to Bayesian stats, with a mixed model example in the brms R package	3 lessons, 2-3 hours each
R for SAS Users: intro to mixed models in R, assuming a SAS background	3 lessons, 90 minutes each
Machine Learning Demystified: introduction to machine learning for curious scientists, with examples in R	1 lesson, 2 hours
MultiOmics Demo: introduction to the idea of combining multiple ‘omics datasets into one analysis, with examples in R/Python	1 lesson, 3 hours

2. Teach from a publicly available online tutorial or training resource

There are lots of great resources available for learning about stats, data science, and scientific programming. I would be happy to lead a workshop where we go through an existing tutorial together. If there is enough interest, we could even work through an entire book in a series of workshops. Here are a few examples, but feel free to suggest one of your own.

3. Bespoke lesson on a topic of your choice

If you are interested in a topic that you cannot find a good learning resource for, I can probably help you find some good resources and lead a workshop where we go through them together. I am also open to developing new lessons if you give me enough advance notice!

SEAStats

Contents:

Contents:

How can I ask you for help?

What information should I provide in a help request?

What kinds of statistical analyses can you help with?

What does the help you provide look like?

Do you have any favorite methods or approaches?

What statistical programming languages do you use?

Can you help me with SciNet?

How fast can you get something back to me?

How do I give you credit for assisting with my analysis?

Can you help me make my science open and reproducible?

2. Publishing data on a repository

What about reviewing my five-year project plan?

Quentin, it’s great that you’re there to help, but I see there is only one of you for hundreds of scientists. How can I learn to do my own stats?

How can I schedule a stats training workshop at my location?

1. Lessons and presentations on the SEAStats page

2. Teach from a publicly available online tutorial or training resource

3. Bespoke lesson on a topic of your choice

Contents:

How can I ask you for help?

What information should I provide in a help request?

How should I format my data to share with you?

What kinds of statistical analyses can you help with?

What does the help you provide look like?

Do you have any favorite methods or approaches?

What statistical programming languages do you use?

Can you help me with SciNet?

How fast can you get something back to me?

How do I give you credit for assisting with my analysis?

Can you help me make my science open and reproducible?

1. Versioning and sharing analysis code

2. Publishing data on a repository

What about reviewing my five-year project plan?

Quentin, it’s great that you’re there to help, but I see there is only one of you for hundreds of scientists. How can I learn to do my own stats?

How can I schedule a stats training workshop at my location?

1. Lessons and presentations on the SEAStats page

2. Teach from a publicly available online tutorial or training resource

3. Bespoke lesson on a topic of your choice