Software Requirements Dataset

I am currently researching the requirements, and if you could find more documents, it would help me a lot if you could share them. The document associated with the dataset can be found here: I believe this is the first record on the website, but you can also find the other datasets useful. This dataset presents PURE (PUblic REquirements dataset), a set of 79 publicly available documents on natural language requirements collected from the Internet. The dataset contains 34,268 sets and can be used for natural language processing tasks typical of requirements engineering, such as model synthesis, abstraction identification, and document structure evaluation. It can be commented on to serve as a reference for other tasks, such as detecting ambiguity, categorizing requirements, and identifying equivalent requirements. In the companion book, we introduce the dataset and compare its language with generic English texts, showing the peculiarities of requirements jargon, which consists of a limited vocabulary of acronyms and domain-specific words and long sentences. We are also introducing the common XML format into which we have manually ported a subset of the documents, in order to facilitate the replication of NLP experiences. XML documents can also be downloaded. Please cite this dataset as Ferrari, A., Spagnolo, G.

O., & Gnesi, S. (2017, September). PURE: A record of documents related to public requirements. In 2017, IEEE 25th International Requirements Engineering Conference (RE) (pp. 502-505). IEEE. This thesis proposes and evaluates machine learning (ML)-based data models to identify and isolate software requirements from datasets containing user application review declarations. ML models classify user application validation declarations into functional requirements (FR), non-functional requirements (NFR), and non-requirements (NR).

This proposed approach was to create a new hybrid dataset that includes software requirements from Software Requirements Specification Documents (SNSs) and user application assessments. The Support Vector Machine (SVM), Stochastic Gradient Descent (SGD) and Random Forest (RF) ML algorithms in combination with the term Frequency-Inverse Document Frequency (TF-IDF) Natural Language Processing (NLP) were implemented on the hybrid dataset. The performance of each data model was evaluated against measures such as accuracy, precision, retrieval and F1 values, and the models were validated with 10x cross-validation. The proposed approach successfully identifies and isolates software requirements, with SGD being the most efficient with an accuracy of 83%. Overall, this thesis presents a comprehensive methodology for implementing machine learning algorithms combined with NLP techniques to identify the requirements of user application evaluations with a high degree of accuracy. Hi, the most cited software requirements dataset I`ve found so far is the TeraPROMISE dataset. Here is the link Individuals and organizations working with arXivLabs have embraced and embraced our values of openness, community, excellence, and user privacy. arXiv is committed to these values and only works with partners who adhere to them. In just 3 minutes, you help us understand how you see arXiv. Identify functional and non-functional software requirements from user application assessments and requirements artifacts arXivLabs is a framework that allows employees to develop and share new arXiv features directly on our website. Do you have a project idea that brings added value to the arXiv community? Learn more about arXivLabs and how you can get involved.